sflibrary(sf)
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
library(tmap)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(magrittr)
library(tidyr)
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
##
## extract
tmap_mode("view")
## tmap mode set to interactive viewing
The sf package supports a wide range of spatial operations both on individual layers, and between layers. These are discussed in this document, for your reference, and may be particularly useful when it comes to the mini-project later in the semester.
I have split operations into those which are performed between two datasets, and those which transform an existing dataset by some geometric method.
I will use three small datasets to demonstrate the various operations. These are loaded as follows:
And make a map to see what we are dealing with
tm_shape(polys) +
tm_polygons(col = "pop2013", alpha = 0.5) +
tm_shape(lines) +
tm_lines(col = 'purple') +
tm_shape(pts) +
tm_dots()
These data are:
polys 2013 Census meshblocks with the population countlines street and path network segmentspts road crash locations in 2017All for an area around 1.6km from Victoria University. If you make a web map you can see the general context better.
The spatial operation most closely related to data table operations is a spatial join. This operation depends on the spatial relations between two layers, which are called binary (spatial) predicates. So we need to look at the predicates before considering join operations.
The binary predicates determine whether or not various spatial conditions are met between two datasets. The binary predicates available in sf are listed here. These vary in their exact interpretation depending on whether the layers involved are points or lines or polygons, but an illustration of them for polygons is shown below, to give you the idea.
These are as discussed in the paper
Egenhofer MJ & Franzosa RD 1991 Point-set topological spatial relations International Journal of Geographical Information Systems 5(2) 161-174.
This wikipedia article provides a good overview of the underlying concepts.
The various binary spatial predicate functions in sf return for two datasets which geometries in the first dataset are related to which in the second based on the particular relation specified. For example, if we want to know which points in the crashes dataset are in which meshblocks, we would do
pts %>%
st_within(polys)
## Sparse geometry binary predicate list of length 575, where the
## predicate was `within'
## first 10 elements:
## 1: 123
## 2: 73
## 3: 267
## 4: 264
## 5: 91
## 6: 292
## 7: 68
## 8: 49
## 9: 213
## 10: 74
The list output is which of the polygons in the polys dataset the first 10 of the points in the pts data layer is within. If we do the equivalent test in reverse
polys %>%
st_contains(pts)
## Sparse geometry binary predicate list of length 406, where the
## predicate was `contains'
## first 10 elements:
## 1: (empty)
## 2: (empty)
## 3: (empty)
## 4: (empty)
## 5: 326
## 6: 118, 298, 478, 505
## 7: (empty)
## 8: 108, 134, 221, 295, 348, 451, 535
## 9: 143, 216, 439
## 10: (empty)
we get which points are contained by each of the polygons. In this case the first four polygons in polys contain no points from pts, the next one contains point 326, the next one contains several points, and so on.
The results we see here show only the first few rows of the results. They are also in a ‘sparse’ format showing only the relations which are TRUE. If we request the complete results by setting the option sparse to FALSE then we see a matrix which shows for every combination of object in the first dataset (the rows) which objects in the second dataset (the columns) satisfies the required spatial relationship.
st_contains(polys, pts, sparse = FALSE)[1:10, 1:10]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Usually the sparse form (the default) is less of a pain to deal with. In that form, the result is a list and extracting the information it contains may require some additional processing.
As an example, we can use the st_contains() operation to count points in polygons. For example, perhaps we want to count the number of crashes in different areas, then the lengths() function applied to the result of st_contains() gives us that information.
num_crashes <- polys %>%
st_contains(pts) %>%
lengths()
polys.n <- polys %>%
mutate(n = num_crashes)
tm_shape(polys.n) +
tm_polygons(col = "n")
If we want the actual information about which objects are in the result list from applying the spatial predicate, then we use the unlist() function to get at it. So for example, a list of the polygons that contain a crash is provided by
pts %>% st_within(polys) %>%
unlist() %>%
unique()
## [1] 123 73 267 264 91 292 68 49 213 74 195 76 11 153 271 194 186 144
## [19] 81 64 103 167 118 197 45 87 381 85 113 72 111 188 93 312 201 156
## [37] 94 184 172 13 146 245 229 405 193 333 226 334 154 147 102 282 216 145
## [55] 171 136 347 86 183 162 203 21 138 51 190 173 289 39 307 379 199 165
## [73] 8 99 101 310 255 6 191 364 248 254 204 284 151 97 225 9 137 205
## [91] 273 25 131 269 202 108 305 122 116 209 373 170 198 239 250 125 276 82
## [109] 132 65 398 100 187 371 90 345 189 71 325 401 326 363 327 182 200 150
## [127] 180 88 168 281 110 238 83 303 389 37 77 119 221 24 215 78 291 120
## [145] 275 141 96 158 357 84 224 5 404 374 67 356 128 35 149 126 69 117
## [163] 252 376 92 306 338 115 290 349 28 181 34 247 311 228 63 265 366 139
## [181] 40 142 155 328 105 62 382 75 251 324 208 152
The two key spatial operations between two datasets are spatial filter (st_filter) and spatial join (st_join).
A spatial filter operation is like the non-spatial filter function in dplyr and will pick out only those rows in the data that meet some criterion, where the criterion is a spatial one, based on the relationship to some other dataset. So for example, we can do a selection of the polygons that contain one or more points like this
polys %>%
st_filter(pts) %>%
tm_shape() +
tm_polygons() +
tm_shape(pts) +
tm_dots(size = 0.01)
st_filter applies some spatial predicate to the relationship between two layers to perform the filtering. By default the predicate is st_intersects which in most cases will pick out where two spatial layers coincide with one another.
As a different example, consider a situation where we have selected only one polygon in the dataset
poly <- polys[100, ] # this selects just row number 100 (we could pick any)
tm_shape(polys) +
tm_polygons() +
tm_shape(poly) +
tm_polygons(col = 'red')
Now we can get the neighbouring polygons like this, by specifying an appropriate predicate:
npolys <- polys %>%
st_filter(poly, predicate = st_touches)
tm_shape(polys) +
tm_polygons() +
tm_shape(npolys) +
tm_polygons(col = 'orange')
Note that the initially selected polygon is considered to ‘touch’ itself. It doesn’t arise in this simple example, but this capability can be used to perform complicated multistep spatial manipulations of datasets.
A spatial join not only applies some spatial predicate to the relation between two layers, but will append the data from the second layer to the data table of the first.
This is an effective way of spatially merging datasets for further analysis. An example of st_join is in the main lab materials. The trickiest aspect to spatial joins is when more than one object meets the join criterion. For example, if we are joining data from points to polygons then many points may intersect with each polygon:
st_join(polys, pts)
## Simple feature collection with 789 features and 9 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 1746199 ymin: 5425271 xmax: 1749916 ymax: 5430532
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
## MB2013 pop2013 CRASH_SEV MULTI_VEH LOCN1 LOCN2
## 1 2142900 9 <NA> <NA> <NA> <NA>
## 2 2124200 3 <NA> <NA> <NA> <NA>
## 3 2128800 177 <NA> <NA> <NA> <NA>
## 4 2119900 6 <NA> <NA> <NA> <NA>
## 5 2122600 0 N Multi vehicle MOLESWORTH ST LAMBTON QUAY
## 6 2131400 150 N Multi vehicle DIXON ST VICTORIA ST
## 6.1 2131400 150 N Multi vehicle VICTORIA ST FELTEX LANE
## 6.2 2131400 150 N Multi vehicle VICTORIA ST FELTEX LANE
## 6.3 2131400 150 N Multi vehicle DIXON ST WILLIS ST
## 7 2133407 36 <NA> <NA> <NA> <NA>
## fatalities serious minor geom
## 1 NA NA NA POLYGON ((1749174 5426376, ...
## 2 NA NA NA POLYGON ((1748753 5428642, ...
## 3 NA NA NA POLYGON ((1748281 5427565, ...
## 4 NA NA NA POLYGON ((1748645 5429096, ...
## 5 0 0 0 POLYGON ((1749061 5428876, ...
## 6 0 0 0 POLYGON ((1748521 5427439, ...
## 6.1 0 0 0 POLYGON ((1748521 5427439, ...
## 6.2 0 0 0 POLYGON ((1748521 5427439, ...
## 6.3 0 0 0 POLYGON ((1748521 5427439, ...
## 7 NA NA NA POLYGON ((1748813 5427143, ...
There are only 406 polygons in the polys dataset, but the joined dataset has 789!
This is because a duplicate entry is made each time a crash point occurs inside a polygon. In effect the table we get as output is a table of the ‘spatial joins’ between the two original datasets. Notice also that where a polygon contains no points we get <NA> entries in the table, because no data was found in the pts layer that intersected with the polys layer, but the polygon is retained anyway.
If we simply don’t want the <NA> values we can specify an ‘inner join’ so that only records where a match was found are retained. We do this by overriding the default ‘left join’ behaviour, which prioritises retaining all rows in the left hand (i.e. first) table.
st_join(polys, pts, left = FALSE)
## Simple feature collection with 575 features and 9 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 1746575 ymin: 5425845 xmax: 1749846 ymax: 5429182
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
## MB2013 pop2013 CRASH_SEV MULTI_VEH LOCN1 LOCN2
## 5 2122600 0 N Multi vehicle MOLESWORTH ST LAMBTON QUAY
## 6 2131400 150 N Multi vehicle DIXON ST VICTORIA ST
## 6.1 2131400 150 N Multi vehicle VICTORIA ST FELTEX LANE
## 6.2 2131400 150 N Multi vehicle VICTORIA ST FELTEX LANE
## 6.3 2131400 150 N Multi vehicle DIXON ST WILLIS ST
## 8 2134800 177 N Multi vehicle SH 1N TARANAKI ST
## 8.1 2134800 177 N Multi vehicle TARANAKI ST ABEL SMITH ST
## 8.2 2134800 177 N Multi vehicle TARANAKI ST ARTHUR ST
## 8.3 2134800 177 N Multi vehicle SH 1N TARANAKI ST
## 8.4 2134800 177 N Multi vehicle SH 1N TARANAKI ST
## fatalities serious minor geom
## 5 0 0 0 POLYGON ((1749061 5428876, ...
## 6 0 0 0 POLYGON ((1748521 5427439, ...
## 6.1 0 0 0 POLYGON ((1748521 5427439, ...
## 6.2 0 0 0 POLYGON ((1748521 5427439, ...
## 6.3 0 0 0 POLYGON ((1748521 5427439, ...
## 8 0 0 0 POLYGON ((1748721 5426755, ...
## 8.1 0 0 0 POLYGON ((1748721 5426755, ...
## 8.2 0 0 0 POLYGON ((1748721 5426755, ...
## 8.3 0 0 0 POLYGON ((1748721 5426755, ...
## 8.4 0 0 0 POLYGON ((1748721 5426755, ...
This still leaves the question of how to combine the multiple pieces of information that are now associated with each polygon derived from multiple crashes. There is no one solution here as it will depend on the interpretation of the data what makes sense. Usually we have to group the joined data based on a unique identifier from the original (left hand) data table, and then use summarise to calculate appropriate new variables. For example
crash_counts <- polys %>%
st_join(pts) %>%
group_by(MB2013) %>% # this variable uniquely identifies each row
summarise(num_crashes = n(), # this counts the number of matching cases
fatalities = sum(fatalities),
serious = sum(serious),
minor = sum(minor),
pop2013 = first(pop2013)) %>%
select(MB2013, pop2013, num_crashes, fatalities, serious, minor)
as_tibble(crash_counts)
## # A tibble: 406 x 7
## MB2013 pop2013 num_crashes fatalities serious minor geom
## <chr> <int> <int> <int> <int> <int> <POLYGON [m]>
## 1 21019… 0 1 NA NA NA ((1748508 5430325, 17485…
## 2 21024… 3 1 NA NA NA ((1747656 5429203, 17476…
## 3 21025… 198 1 NA NA NA ((1747332 5429467, 17473…
## 4 21028… 174 1 NA NA NA ((1747086 5429464, 17470…
## 5 21030… 21 1 NA NA NA ((1747462 5429071, 17474…
## 6 21030… 51 1 NA NA NA ((1747558 5429004, 17475…
## 7 21055… 0 1 NA NA NA ((1746865 5428490, 17468…
## 8 21061… 39 2 0 0 2 ((1746734 5428512, 17467…
## 9 21111… 132 3 0 0 1 ((1746722 5428375, 17467…
## 10 21112… 243 1 NA NA NA ((1746568 5428275, 17465…
## # … with 396 more rows
The biggest problem this leaves us with is a lot of NA values. The best way to deal with these is to use the tidyr::replace_na() function in the summarise() calculation to replace NA values with an appropriate alternative. For example, we can do
crash_counts <- polys %>%
st_join(pts) %>%
group_by(MB2013) %>%
summarise(num_crashes = n(),
fatalities = replace_na(sum(fatalities), 0),
serious = replace_na(sum(serious), 0),
minor = replace_na(sum(minor), 0),
pop2013 = first(pop2013)) %>%
select(MB2013, pop2013, num_crashes, fatalities, serious, minor)
as_tibble(crash_counts)
## # A tibble: 406 x 7
## MB2013 pop2013 num_crashes fatalities serious minor geom
## <chr> <int> <int> <dbl> <dbl> <dbl> <POLYGON [m]>
## 1 21019… 0 1 0 0 0 ((1748508 5430325, 17485…
## 2 21024… 3 1 0 0 0 ((1747656 5429203, 17476…
## 3 21025… 198 1 0 0 0 ((1747332 5429467, 17473…
## 4 21028… 174 1 0 0 0 ((1747086 5429464, 17470…
## 5 21030… 21 1 0 0 0 ((1747462 5429071, 17474…
## 6 21030… 51 1 0 0 0 ((1747558 5429004, 17475…
## 7 21055… 0 1 0 0 0 ((1746865 5428490, 17468…
## 8 21061… 39 2 0 0 2 ((1746734 5428512, 17467…
## 9 21111… 132 3 0 0 1 ((1746722 5428375, 17467…
## 10 21112… 243 1 0 0 0 ((1746568 5428275, 17465…
## # … with 396 more rows
Finally, you may want to clip one layer with another. This is done using the st_intersection function. Here’s how it works, using the convex hull of a set of points (we’ll see what the convex hull is in a moment).
# Clipping the polygons by the convex hull of the points
hull <- pts %>%
st_union() %>%
st_convex_hull()
polys %>%
st_intersection(hull) %>%
plot()
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
The unary operations we can apply to geometries in a dataset are listed here (and you can look up any of them in the help in the usual way). Some examples are discussed below. First we consider operations that apply to all objects in a dataset inidividually.
st_bufferBuffering expands spatial objects by some specified distance out from the objects. You have to specify a dist.
pts.buf <- st_buffer(pts, dist = 50)
polys.buf <- st_buffer(polys, dist = -20)
lines.buf <- st_buffer(lines, dist = 20)
This function preserves any variables associated with the dataset in the buffered dataset. A set of points will become a set of circles when buffered, whereas lines become ‘lozenge’ shapes, and polygons become larger, smoothed versions of themselves.
Buffering is frequently used to explore topics like the spatial relationship between (for example) liquor outlets and schools, or accessibility to public transport and so on.
m1 <- tm_shape(lines.buf, alpha = 0.75) +
tm_polygons()
m2 <- tm_shape(pts.buf, alpha = 0.75) +
tm_polygons()
m3 <- tm_shape(polys.buf, alpha = 0.75) +
tm_polygons()
tmap_arrange(m1, m2, m3, nrow = 1)
## Warning: The shape polys.buf contains empty units.
If dist is negative (as it is here for the meshblocks) it will shrink the geometry in on itself. This only makes sense for polygons, and if the shrinking is too large it may cause errors or warnings (as small polygons collapse to nothing or turn inside-out…).
st_centroidThe centroid of a geometry is a point at its geometric centre.
tm_shape(polys) +
tm_polygons() +
tm_shape(st_centroid(polys)) +
tm_dots()
## Warning in st_centroid.sf(polys): st_centroid assumes attributes are constant
## over geometries of x
The centroid is not guaranteed to be inside the geometry. There is at least one example of this in the example data - and most likely more.
If you need a representative point guaranteed to be inside a polygon, then use st_point_on_surface() instead, although be aware that the location of this point is not guaranteed to be the same every time your run the function!
The convex hull of a set of points is the shape you would get if you stretch an elastic band around the outer edge of the set of points that define the object. If we extract a single polygon from the polys dataset, that’s the easiest way to illustrate this.
poly <- polys %>%
filter(MB2013 == 2124300)
Now plot this and its convex hull
tm_shape(poly) +
tm_polygons() +
tm_shape(st_convex_hull(poly)) +
tm_polygons(col = 'blue', alpha = 0.35, border.col = 'red')
The convex hull of a geometry is a nice simplified summary of its location. It is particularly useful for representing a collection of points. However, if you apply it to a point dataset, you will just get a collection of points, because the function is applied to each point individually.
To create the convex hull of the whole set of points, you have to first merge them into a multipoint with the st_union operation:
pts %>%
st_union() %>%
st_convex_hull() %>%
tm_shape() +
tm_polygons(alpha = 0.5, border.col = "red") +
tm_shape(pts) +
tm_dots()
Some operations only make sense when applied to a collection of geometries, not one object at a time. Two in particular that are widely used are st_voronoi() and st_triangulate().
The Voronoi region associated with a geometry is the region of space closer to that geometry than to any other geometry in a set of geometries. This is easiest to understand for points.
Unfortunately, the st_voronoi operation applied to a point dataset behaves rather unexpectedly, so a series of transformations are required to convert it to a useful polygon dataset:
voronoi <- pts %>%
st_union() %>% # combine points into a multipoint
st_voronoi() %>%
st_cast() %>% # no idea!
st_as_sf() # convert to a sf dataset
And now let’s plot them along with the originating points.
tm_shape(voronoi) +
tm_polygons(alpha = 0.5) +
tm_shape(pts) +
tm_dots()
By default the function adds a large buffer region to the points before doing the Voronoi calculation. You can reduce this effect by intersecting the result with a bounding box or convex hull:
pts.hull <- pts %>%
st_union() %>%
st_convex_hull() %>%
st_buffer(dist = 100)
voronoi %>% st_intersection(pts.hull) %>%
tm_shape() +
tm_polygons() +
tm_shape(pts) +
tm_dots()
The Voronoi polygons of a set of points can be considered ‘service areas’ of the points. Imagine the points are (say) bus stops, then each polygon is the area closer to a given bus stop than to any other bus stop, so other things being equal, it is the ‘service area’ of that bus stop.
Of course… the areas are determined without reference to movement on street networks or hills or anything else, so they are too simplistic for application in an unprocessed form. Alternative methods involve calculating distances over networks not just using straight point-to-point calculations.
An interesting example of the simple form of Voronoi, but applied to true great circle distances on Earth surface is this one by Jason Davies for world airports.
The triangulation of a set of points connects the points together into a mesh of triangles. Similar to the Voronoi operation, some annoying additional conversions are required.
tri <- pts %>%
st_union() %>% # combine potnts into a multipoint
st_triangulate() %>%
st_cast() %>% # no idea!
st_as_sf() # convert to a sf dataset
tm_shape(tri) +
tm_polygons(alpha = 0.5) +
tm_shape(pts) +
tm_dots()
The triangulation, the Voronoi polygons, and the convex hull are all closely related. The convex hull of the points is also the outer perimeter of the triangulation.
An alternative way to summarise the locations of a collection of points is called the alphashape which is based on removing edges from the triangulation until the remaining triangles more closely fit the shape of the set of points. However alphashapes are not supported by any functions in the sf package. An R package that supports alphashapes is alphahull but it is not directly compatible with sf. If you want to see what this looks likes try running the following code:
library(alphahull)
## Registered S3 method overwritten by 'spatstat.geom':
## method from
## print.boxx cli
xy <- st_coordinates(pts)
xy <- unique(xy) # there are some duplicate points in the data, which we need to remove
plot(ashape(xy, alpha = 350), asp = 1)
library(RXKCD)
getXKCD(627)$img
## Warning in readPNG(paste(tempdir(), "xkcd.png", sep = "/")): libpng warning:
## iCCP: profile 'Photoshop ICC profile': 'RGB ': RGB color space not permitted on
## grayscale PNG
## [1] "https://imgs.xkcd.com/comics/tech_support_cheat_sheet.png"